Better Addressing Word Deletion for Statistical Machine Translation
نویسندگان
چکیده
Word deletion (WD) problems have a critical impact on the adequacy of translation and can lead to poor comprehension of lexical meaning in the translation result. This paper studies how the word deletion problem can be handled in statistical machine translation (SMT) in detail. We classify this problem into desired and undesired word deletion based on spurious and meaningful words. Consequently, we propose four effective models to handle undesired word deletion. To evaluate word deletion problems, we develop an automatic evaluation metric that highly correlates with human judgement. Translation systems are simultaneously tuned for the proposed evaluation metric and BLEU using minimum error rate training (MERT). The experimental results demonstrate that our methods achieve significant improvements in word deletion problems on Chinese-to-English translation tasks.
منابع مشابه
A Hybrid Machine Translation System Based on a Monotone Decoder
In this paper, a hybrid Machine Translation (MT) system is proposed by combining the result of a rule-based machine translation (RBMT) system with a statistical approach. The RBMT uses a set of linguistic rules for translation, which leads to better translation results in terms of word ordering and syntactic structure. On the other hand, SMT works better in lexical choice. Therefore, in our sys...
متن کاملDealing with deletion errors in MT
Content word deletion has widely been identified as a common error in statistical machine translation systems. We refer to a translation error as “content word deletion” when a content word appears in the reference translation of some sentence, but is absent from the system output translation of that sentence. For our purposes, the term “content word” does not have a precise definition, but rep...
متن کاملContext Sensitive Word Deletion Model for Statistical Machine Translation
Word deletion (WD) errors can lead to poor comprehension of the meaning of source translated sentences in phrase-based statistical machine translation (SMT), and have a critical impact on the adequacy of the translation results generated by SMT systems. In this paper, first we classify the word deletion into two categories, wanted and unwanted word deletions. For these two kinds of word deletio...
متن کاملChunk-Based EBMT
Corpus driven machine translation approaches such as Phrase-Based Statistical Machine Translation and Example-Based Machine Translation have been successful by using word alignment to find translation fragments for matched source parts in a bilingual training corpus. However, they still cannot properly deal with systematic translation for insertion or deletion words between two distant language...
متن کاملAn Empirical Study in Source Word Deletion for Phrase-Based Statistical Machine Translation
The treatment of ‘spurious’ words of source language is an important problem but often ignored in the discussion on phrase-based SMT. This paper explains why it is important and why it is not a trivial problem, and proposes three models to handle spurious source words. Experiments show that any source word deletion model can improve a phrase-based system by at least 1.6 BLEU points and the most...
متن کامل